22 research outputs found

    Reinforcement Learning-based Optimization of Multiple Access in Wireless Networks

    Get PDF
    In this thesis, we study the problem of Multiple Access (MA) in wireless networks and design adaptive solutions based on Reinforcement Learning (RL). We analyze the importance of MA in the current communications scenery, where bandwidth-hungry applications emerge due to the co-evolution of technological progress and societal needs, and explain that improvements brought by new standards cannot overcome the problem of resource scarcity. We focus on resource-constrained networks, where devices have restricted hardware-capabilities, there is no centralized point of control and coordination is prohibited or limited. The protocols that we optimize follow a Random Access (RA) approach, where sensing the common medium prior to transmission is not possible. We begin with the study of time access and provide two reinforcement learning algorithms for optimizing Irregular Repetition Slotted ALOHA (IRSA), a state-of-the-art RA protocol. First, we focus on ensuring low complexity and propose a Q-learning variant where learners act independently and converge quickly. We, then, design an algorithm in the area of coordinated learning and focus on deriving convergence guarantees for learning while minimizing the complexity of coordination. We provide simulations that showcase how coordination can help achieve a fine balance, in terms of complexity and performance, between fully decentralized and centralized solutions. In addition to time access, we study channel access, a problem that has recently attracted significant attention in cognitive radio. We design learning algorithms in the framework of Multi-player Multi-armed Bandits (MMABs), both for static and dynamic settings, where devices arrive at different time steps. Our focus is on deriving theoretical guarantees and ensuring that performance scales well with the size of the network. Our works constitute an important step towards addressing the challenges that the properties of decentralization and partial observability, inherent in resource-constrained networks, pose for RL algorithms

    Grounding Artificial Intelligence in the Origins of Human Behavior

    Full text link
    Recent advances in Artificial Intelligence (AI) have revived the quest for agents able to acquire an open-ended repertoire of skills. However, although this ability is fundamentally related to the characteristics of human intelligence, research in this field rarely considers the processes that may have guided the emergence of complex cognitive capacities during the evolution of the species. Research in Human Behavioral Ecology (HBE) seeks to understand how the behaviors characterizing human nature can be conceived as adaptive responses to major changes in the structure of our ecological niche. In this paper, we propose a framework highlighting the role of environmental complexity in open-ended skill acquisition, grounded in major hypotheses from HBE and recent contributions in Reinforcement learning (RL). We use this framework to highlight fundamental links between the two disciplines, as well as to identify feedback loops that bootstrap ecological complexity and create promising research directions for AI researchers

    Fast Q-learning for Improved Finite Length Performance of Irregular Repetition Slotted ALOHA

    Get PDF
    In this paper, we study the problem of designing adaptive Medium Access Control (MAC) solutions for wireless sensor networks (WSNs) under the Irregular Repetition Slotted ALOHA (IRSA) protocol. In particular, we optimize the degree distribution employed by IRSA for finite frame sizes. Motivated by characteristics of WSNs, such as the restricted computational resources and partial observability, we model the design of IRSA as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP). We have theoretically analyzed our solution in terms of optimality of the learned IRSA design and derived guarantees for finding near-optimal policies. These guarantees are generic and can be applied in resource allocation problems that exhibit the waterfall effect, which in our setting manifests itself as a severe degradation in the overall throughput of the network above a particular channel load. Furthermore, we combat the inherent non-stationarity of the learning environment in WSNs by advancing classical Q-learning through the use of virtual experience (VE), a technique that enables the update of multiple state-action pairs per learning iteration and, thus, accelerates convergence. Our simulations confirm the superiority of our learning-based MAC solution compared to traditional IRSA and provide insights into the effect of WSN characteristics on the quality of learned policies

    Robust Coordinated Reinforcement Learning for MAC Design in Sensor Networks

    Get PDF
    In this paper, we propose a medium access control (MAC) design method for wireless sensor networks based on decentralized coordinated reinforcement learning. Our solution maps the MAC resource allocation problem first to a factor graph, and then, based on the dependencies between sensors, transforms it into a coordination graph, on which the max-sum algorithm is employed to find the optimal transmission actions for sensors. We have theoretically analyzed the system and determined the convergence guarantees for decentralized coordinated learning in sensor networks. As part of this analysis, we derive a novel sufficient condition for the convergence of max-sum on graphs with cycles and employ it to render the learning process robust. In addition, we reduce the complexity of applying max-sum to our optimization problem by expressing coordination as a multiple knapsack problem (MKP). The complexity of the proposed solution can be, thus, bounded by the capacities of the MKP. Our simulations reveal the benefits coming from adaptivity and sensors’ coordination, both inherent in the proposed learning-based MAC

    Fast reinforcement learning for decentralized MAC optimization

    Get PDF
    In this paper, we propose a novel decentralized framework for optimizing the transmission strategy of Irregular Repetition Slotted ALOHA (IRSA) protocol in sensor networks. We consider a hierarchical communication framework that ensures adaptivity to changing network conditions and does not require centralized control. The proposed solution is inspired by the reinforcement learning literature, and, in particular, Q-learning. To deal with sensor nodes' limited lifetime and communication range, we allow them to decide how many packet replicas to transmit considering only their own buffer state. We show that this information is sufficient and can help avoiding packets' collisions and improving the throughput significantly. We solve the problem using the decentralized partially observable Markov Decision Process (Dec-POMDP) framework, where we allow each node to decide independently of the others how many packet replicas to transmit. We enhance the proposed Q-learning based method with the concept of virtual experience, and we theoretically and experimentally prove that convergence time is, thus, significantly reduced. The experiments prove that our method leads to large throughput gains, in particular when network traffic is heavy, and scales well with the size of the network. To comprehend the effect of the problem's nature on the learning dynamics and vice versa, we investigate the waterfall effect, a severe degradation in performance above a particular traffic load, typical for codes-on-graphs and prove that our algorithm learns to alleviate it

    Collision Resolution in Multi-player Bandits Without Observing Collision Information

    Get PDF
    The absence of collision information in Multi- player Multi-armed bandits (MMABs) renders arm availabilities partially observable, impeding the design of algorithms with regret guarantees that do not allow inter-player communication. In this work, we propose a collision resolution (CR) mechanism for MMABs inspired from sequential interference mechanisms employed in communication protocols. In the general case, our collision resolution mechanism assumes that players can pull multiple arms during the exploration phase. We, thus, propose a novel MMAB model that captures this while still considering strictly bandit feedback and single-pulls during the exploitation phase. We theoretically analyze the CR mechanism using tools from information theory in order to prove the existence of an upper bound on the probability of its failure that decreases at a rate exponential in the number of players

    Robust multi-agent Q-learning in cooperative games with adversaries

    Get PDF
    We present RoM-Q 1, a new Q-learning-like algorithm for finding policies robust to attacks in multi-agent systems (MAS). We consider a novel type of attack, where a team of adversaries, aware of the optimal multi-agent Q-value function, performs a worst-case selection of both the agents to attack and the actions to perform. Our motivation lies in real-world MAS where vulnerabilities of particular agents emerge due to their characteristics and robust policies need to be learned without requiring the simulation of attacks during training. In our simulations, where we train policies using RoMQ, Q-learning and minimax-Q and derive corresponding adversarial attacks, we observe that policies learned using RoM-Q are more robust, as they accrue the highest rewards against all considered adversarial attacks

    Resilience - towards an interdisciplinary definition using information theory

    Get PDF
    The term “resilience” has risen in popularity following a series of natural disasters, the impacts of climate change, and the Covid-19 pandemic. However, different disciplines use the term in widely different ways, resulting in confusion regarding how the term is used and difficulties operationalising the underlying concept. Drawing on an overview of eleven disciplines, our paper offers a guiding framework to navigate this ambiguity by suggesting a novel typology of resilience using an information-theoretic approach. Specifically, we define resilience by borrowing an existing definition of individuals as sub-systems within multi-scale systems that exhibit temporal integrity amidst interactions with the environment. We quantify resilience as the ability of individuals to maintain fitness in the face of endogenous and exogenous disturbances. In particular, we distinguish between four different types of resilience: (i) preservation of structure and function, which we call “strong robustness”; (ii) preservation of function but change in structure (“weak robustness”); (iii) change in both structure and function (“strong adaptability”); and (iv) change in function but preservation in structure (“weak adaptability”). Our typology offers an approach for navigating these different types and demonstrates how resilience can be operationalised across disciplines

    Dynamics of niche construction in adaptable populations evolving in diverse environments

    Full text link
    In both natural and artificial studies, evolution is often seen as synonymous to natural selection. Individuals evolve under pressures set by environments that are either reset or do not carry over significant changes from previous generations. Thus, niche construction (NC), the reciprocal process to natural selection where individuals incur inheritable changes to their environment, is ignored. Arguably due to this lack of study, the dynamics of NC are today little understood, especially in real-world settings. In this work, we study NC in simulation environments that consist of multiple, diverse niches and populations that evolve their plasticity, evolvability and niche-constructing behaviors. Our empirical analysis reveals many interesting dynamics, with populations experiencing mass extinctions, arms races and oscillations. To understand these behaviors, we analyze the interaction between NC and adaptability and the effect of NC on the population's genomic diversity and dispersal, observing that NC diversifies niches. Our study suggests that complexifying the simulation environments studying NC, by considering multiple and diverse niches, is necessary for understanding its dynamics and can lend testable hypotheses to future studies of both natural and artificial systems

    Plasticity and evolvability under environmental variability: the joint role of fitness-based selection and niche-limited competition

    No full text
    International audienceThe diversity and quality of natural systems have been a puzzle and inspiration for communities studying artificial life. It is now widely admitted that the adaptation mechanisms enabling these properties are largely influenced by the environments they inhabit. Organisms facing environmental variability have two alternative adaptation mechanisms operating at different timescales: \textit{plasticity}, the ability of a phenotype to survive in diverse environments and \textit{evolvability}, the ability to adapt through mutations. Although vital under environmental variability, both mechanisms are associated with fitness costs hypothesized to render them unnecessary in stable environments. In this work, we study the interplay between environmental dynamics and adaptation in a minimal model of the evolution of plasticity and evolvability. We experiment with different types of environments characterized by the presence of niches and a climate function that determines the fitness landscape. We empirically show that environmental dynamics affect plasticity and evolvability differently and that the presence of diverse ecological niches favors adaptability even in stable environments. We perform ablation studies of the selection mechanisms to separate the role of fitness-based selection and niche-limited competition. Results obtained from our minimal model allow us to propose promising research directions in the study of open-endedness in biological and artificial systems
    corecore